Note: Some of the figures in this document use grid.arrange or ggarrange. The heights/widths arguments in grid.arrange() and ggarrange() may have to be tweaked to get the proportions of the panels to look less lopsided.

1 Setup

2 DEG analysis

2.1 Create design matrix

Create design matrix using all permunations of sample codes. 1 and 0 corresponds to which side of the comparison the sample in the sample_names column is. The sample names will just correspond to the rownames of the phenotype table. this gets combined in a for-loop later.

This design matrix is referred to as the “pooled comparison strategy” used in the analysis, as it allows us to look at the differences between different regions without resorting entirely to 1v1 comparisons. This way, we can identify gene signatures unique to a particular region compared to the rest, or to identify gene signatures that may be associated with samples containing similar clonal architectures from the previous clonevol analysis.

2.6 Get GSEA scores for MSigDB oncogenic signatures

First coerce the deg results from DEseq2 into a list of ranks for each comparison using the method described here: https://stephenturner.github.io/deseq-to-fgsea/

Then, we will use fgsea to perform gene set enrichment on all the MSigDB oncogenic signatures.

Generate the enrichment scores & coerce to dataframe

Select the top pathways, filter irrelevant terms, and order the pathways via hierarchical clustering.

# select top pathways & order pathways via hclust
data_list <- lapply(all_fgsea, function(x){
  sig_name <- levels(as.factor(x$signature_name))
  x$pathway <- gsub(paste0(sig_name, '_'), '', x=x$pathway)
  x$sample_comparison <- factor(x$sample_comparison, levels=names(deg_ranks))
  
  df <- x %>%
  filter(padj < 0.05) %>% arrange(padj) %>% group_by(sample_comparison) %>%
  top_n(10, wt=-padj) %>% top_n(10, wt=abs(NES)) %>%
  arrange(abs(NES), -padj) 

  tmp <- df %>% dplyr::select(sample_comparison, pathway, NES) %>%
  group_by(sample_comparison) %>% tidyr::spread(pathway, NES) %>% 
  as.data.frame()
  
  names <- tmp$sample_comparison
  tmp$sample_comparison <- NULL
  tmp <- as.data.frame(lapply(tmp, function(x){
    x[is.na(x)] <- 0
    return(x) }))
  
  rownames(tmp) <- names
  order <- hclust( dist(t(tmp), method = "euclidean"), method = "ward.D" )$order
  df$pathway <- factor(df$pathway,levels=colnames(tmp)[order])


  df <- df[!is.na(df$pathway),]
  df <- df[df$pathway != 'NA',]

  rm(names, tmp)

  return(df)
})

terms <- c('INFLUENZA', 'CELL_CYCLE', 'MITOTIC', 'S_PHASE', 'G1_S_TRANSITION', 'ASTHMA', 'AUTOIMMUNE_THYROID_DISEASE',
'INFECTION','DISEASE', 'INTESTINAL_IMMUNE_NETWORK_FOR_IGA_PRODUCTION','METABOLISM', 'APICAL_SURFACE','GENESIS',
 '43S', 'SYNAP','NEUROTRANSMITTER', 'GABA', 'GLUTAMATE', 'POTASSIUM_CHANNELS', 'CHANNEL_TRANSPORT', 'CELL_LINEAGE',
 'LUPUS', 'NEURONAL',  'RP_DEPENDENT_COTRANSLATIONAL_PROTEIN_TARGETING_TO_MEMBRANE', 'TRANSLATION', 'proteasome', 'complement', 'homologous_recombination',
"peptide_chain",'second_messenger', 'REACTIVE_OXYGEN_SPECIES', 'NONSENSE_MEDIATED_DECAY', 'G2M_checkpoint', 'G2_M_CHECKPOINT', 'IMMUNOREGULATORY_INTERACTIONS_BETWEEN',
'RIBOSOME', 'TIGHT_JUNCTION' ) 


data_filt <- lapply(data_list, function(x){

  y <- x
  for (term in terms){
    y <- y %>% filter(!grepl(term, x=pathway, ignore.case=TRUE))
  }

df <- y

tmp <- df %>% dplyr::select(sample_comparison, pathway, NES) %>%
  group_by(sample_comparison) %>% tidyr::spread(pathway, NES) %>% 
  as.data.frame()
  
  names <- tmp$sample_comparison
  tmp$sample_comparison <- NULL
  tmp <- as.data.frame(lapply(tmp, function(x){
    x[is.na(x)] <- 0
    return(x) }))
  
  rownames(tmp) <- names
  order <- hclust( dist(t(tmp), method = "euclidean"), method = "ward.D" )$order
  df$pathway <- factor(df$pathway,levels=colnames(tmp)[order])


  df <- df[!is.na(df$pathway),]
  df <- df[df$pathway != 'NA',]

  rm(names, tmp)

  return(df)

})

3 Integrate TCGA expression information

3.2 With distributions of specific checkpoint and immune genes

Objective: get a measure (a p-value) of whether the variance between my four samples is greater than if you were to take the variance four random TCGA samples. To do this, do resamplings of 10% of the total possible combinations of 4 tcga-GBM samples from a pool of 155 samples.

Get a measure of whether the variance between my four samples is greater than if you were to take four random TCGA samples.

3.3 plot the checkpoint violins

Note: we saved a pre-computed version of this to reduce runtime and will use it in this RMarkdown document.

Supplementary Figure 3 - overlay onto the PCA

4 Manuscript Figure 4

## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 151 rows containing missing values (geom_label_repel).

6 GSEA for Zhao signatures

Taking the pre-treatment and post-treatment responder and non-responder signatures in figure 3b, and ext data fig 7, do single sample GSEA on the relevant gene sets to see if there is any association.

## # A tibble: 26 x 7
##    pathway               pval   padj   NES  size sample_comparis… signature_name
##    <chr>                <dbl>  <dbl> <dbl> <int> <chr>            <chr>         
##  1 XU_GH1_EXOGENOUS_… 0.00149 0.0104  1.80    92 A, B - vs - C, P PRE-TX_DN     
##  2 GSE21678_WT_VS_FO… 0.00140 0.0104  1.54   162 A, B - vs - C, P PRE-TX_DN     
##  3 GSE17721_LPS_VS_P… 0.00121 0.0169  1.62   180 P - vs - A, B, C PRE-TX_DN     
##  4 FAELT_B_CLL_WITH_… 0.0168  0.0454 -1.57    48 C - vs - A, B, P PRE-TX_DN     
##  5 MARCHINI_TRABECTE… 0.0235  0.0454 -1.67    18 C - vs - A, B, P PRE-TX_DN     
##  6 GSE17721_LPS_VS_P… 0.0264  0.0454 -1.37   184 C - vs - A, B, P PRE-TX_DN     
##  7 XU_GH1_EXOGENOUS_… 0.0163  0.0454 -1.52    91 C - vs - A, B, P PRE-TX_DN     
##  8 MTOR_UP.N4.V1_DN   0.0213  0.0454 -1.37   155 C - vs - A, B, P PRE-TX_DN     
##  9 GSE45739_NRAS_KO_… 0.00256 0.0359 -1.48   174 C - vs - A, B, P PRE-TX_DN     
## 10 GSE21678_WT_VS_FO… 0.0292  0.0454 -1.34   160 C - vs - A, B, P PRE-TX_DN     
## # … with 16 more rows

## Warning: Removed 1359 rows containing missing values (geom_label_repel).

7 RSession Info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] lattice_0.20-38             ggbeeswarm_0.7.0           
##  [3] fgsea_1.10.1                Rcpp_1.0.3                 
##  [5] ggpubr_0.2.4                magrittr_1.5               
##  [7] org.Hs.eg.db_3.8.2          AnnotationDbi_1.46.1       
##  [9] gridExtra_2.3               ggrepel_0.8.1              
## [11] combinat_0.0-8              forcats_0.4.0              
## [13] stringr_1.4.0               dplyr_0.8.3                
## [15] purrr_0.3.3                 readr_1.3.1                
## [17] tidyr_1.0.0                 tibble_2.1.3               
## [19] ggplot2_3.2.1               tidyverse_1.2.1            
## [21] edgeR_3.26.8                limma_3.40.6               
## [23] DESeq2_1.24.0               SummarizedExperiment_1.14.1
## [25] DelayedArray_0.10.0         BiocParallel_1.18.1        
## [27] matrixStats_0.55.0          Biobase_2.44.0             
## [29] GenomicRanges_1.36.1        GenomeInfoDb_1.20.0        
## [31] IRanges_2.18.3              S4Vectors_0.22.1           
## [33] BiocGenerics_0.30.0        
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1       ggsignif_0.6.0         htmlTable_1.13.2      
##  [4] XVector_0.24.0         base64enc_0.1-3        rstudioapi_0.10       
##  [7] bit64_0.9-7            fansi_0.4.0            lubridate_1.7.4       
## [10] xml2_1.2.2             splines_3.6.1          geneplotter_1.62.0    
## [13] knitr_1.26             zeallot_0.1.0          Formula_1.2-3         
## [16] jsonlite_1.6           broom_0.5.2            annotate_1.62.0       
## [19] cluster_2.1.0          compiler_3.6.1         httr_1.4.1            
## [22] backports_1.1.5        assertthat_0.2.1       Matrix_1.2-17         
## [25] lazyeval_0.2.2         cli_1.1.0              acepack_1.4.1         
## [28] htmltools_0.4.0        tools_3.6.1            gtable_0.3.0          
## [31] glue_1.3.1             GenomeInfoDbData_1.2.1 fastmatch_1.1-0       
## [34] cellranger_1.1.0       vctrs_0.2.0            nlme_3.1-142          
## [37] xfun_0.11              rvest_0.3.5            lifecycle_0.1.0       
## [40] XML_3.98-1.20          MASS_7.3-51.4          zlibbioc_1.30.0       
## [43] scales_1.0.0           hms_0.5.2              RColorBrewer_1.1-2    
## [46] yaml_2.2.0             memoise_1.1.0          rpart_4.1-15          
## [49] latticeExtra_0.6-28    stringi_1.4.3          RSQLite_2.1.2         
## [52] genefilter_1.66.0      checkmate_1.9.4        rlang_0.4.1           
## [55] pkgconfig_2.0.3        bitops_1.0-6           evaluate_0.14         
## [58] labeling_0.3           htmlwidgets_1.5.1      cowplot_1.0.0         
## [61] bit_1.1-14             tidyselect_0.2.5       R6_2.4.1              
## [64] generics_0.0.2         Hmisc_4.3-0            DBI_1.0.0             
## [67] pillar_1.4.2           haven_2.2.0            foreign_0.8-72        
## [70] withr_2.1.2            survival_3.1-7         RCurl_1.95-4.12       
## [73] nnet_7.3-12            modelr_0.1.5           crayon_1.3.4          
## [76] utf8_1.1.4             rmarkdown_1.17         locfit_1.5-9.1        
## [79] readxl_1.3.1           data.table_1.12.6      blob_1.2.0            
## [82] digest_0.6.22          xtable_1.8-4           munsell_0.5.0         
## [85] beeswarm_0.2.3         vipor_0.4.5